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ABSTRACT 



This thesis was undertaken to examine an acoustical 
signal processing test bed/ similiar to the one installed at 
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ABBREVIATIONS 



A1 ** AP-120B Adder Reqister One 
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DMA ** 


Direct Memory Access 
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AP-120B Data Pad X 



DPY *★ AP-120B Data Paa Y 

FA ** AP-120B Adder Result Register 



FCB ** MAP-300 Function Control Block 



FFT ** Fast Fourier Transform 

FIFFT ** Forv^ard/Inverse Fast Fourier Transform Test 
FIFO * * First In First Out 

FL ** AP-l^OB Adoer Results Less Than Zero 
FM AP-120B Multiclier Result Register 

pMj ★★ AP-1208 Format Register 
FN *-k AP-120B Function Register 
FO * *■ AP-120B Adder Exponent Overflow 
FU ** AP-120B Adder Exponent Underflow 
FZ ** AP-1206 Adder Results Equal Zero 
hMA AP-120B Host Memory Access Register 
hIC * * MAP-300 Host Interface Controller 
HIM MAP-300 Host Interface Moaule 

hIS ** MAP-300 Host Interface Scroll 
lOS ** map- 300 Input/OutPut Scroll 
IQ MAP-300 Input Queue 



LIrO ** Last In First Out 
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Ml AP-l^OB Multiplier Unit Number One 

M2 AP-120B Multiplier Unit Number Two 

MAP Macro Array Processor 

MAP-300 CSPI Macro Array Processor Model 300 

MD AP-120B Main Data Memory Output buffer Register 

MI * * AP-1206 Main Data ’^lemory Input buffer 

MOS Metal ic Oxide Semiconductor 

MT6F * * Mean Time Between Failure 

MTTR * Mean Time To Repair 

NOP ** No Operation 

00 ** MAP-300 Output Queue 

P0-P3 MAP-300 Program Counters One Through Three 

p MAP-300 Multiplier Results Register 

PIOC ** AP-1206 Programmable Input/OutPut Channel 
PIOP AP-1208 Programmable Input/Output Processor 
R MAP-300 Agder Results Register 



RAF maP- 300 Read Address FIFO 



ramp * ^ Reliability And Maintenance Program 

RFFT ** Real to Complex FFT 

RFFTSC ** Real FFT Scale and Format 

ROM Read-Only Memory 

S-Paa AP-120B Scratch Pad 

SMAP-II MAP-300 Systematic Notation For 

Processing Version II 

SPFN AP-120B S-Pac Outout Buffer Register 
SRA * Subroutine Return Address 
SWR AP-IPOB Switch Register 
SYSFLG MAP**300 System Flaa Register 
[M AP*-*'12O0 Table M 0 mQpy 

TMA AP-120B Table Memory Address Register 

TMKAM ** AP-120B Random Access Table M 0 mory 

VAC Volts Alternating Current 

VMUL * * Vector Multiply 

vvAF ★★ MAP*^300 A' rite Address FIFO 

aC ** AP-120B Aord Count Register 
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INTRODUCTION 



The purpose of tnis study is to begin evaluation of a 
proposed sianal-orocessing test bea si mi liar to the test bed 
being installed at the Naval Postgraduate School/ Monterey/ 
California. The oasic test bed consists of an analog 
subsystem (fig 1)/ date-processing subsystem (fig «f)/ 
signal-process i no subsystem (fig 3) ana display subsystem 
(fig to be used for general-purpose Naval research. 

The analog subsystem of the test bed was designed for 
signal reception and condition i no. This is basically 
accomplished by a 12b-line input into a orogrammed matrix 
switch which emits 5 d lines of outout. These 52 lines 
continue throuah a oroaram-controlled filter issuing output 
from the subsystem. 

Ine signal-processing subsystem receives results from 
the analog subsystem via an AM -5^00 A/D converter. This 
information can then oe stored in an Amoex N^egastore unit to 
be later processed by one MAP-300 array processor. A 
PDP-ll/3g computer controls the mass storage device/ the 
array orocessor and input functions. Output is directed to 
the data-processing subsystem. 

[he data-processing subsystem receives the processed 
data and controls the operation of the display subsystem. 
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ANALOG SUBSYSTEM 
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SIGNAL PROCESSING SUBSYSTEM 

Figure 2 




DATA PROCESSING SUBSYSTEM 

Figure 3 
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DISPLAY SUBSYSTEM 



Display devices presently include a Ramtek 9300 Video 
Display Unit (color and shades of gray)^ the Versatec 1600A 
fpr inter/plotter and an EPC 2300 Gram Writer. 

The goal of this study was to examine the major system 
components# computers# array processors and major data paths 
to determine feasibility for various uses and suggest 
possible alternative methods# especially in the real-time 
environment. The basic task o^ the test bed was assumed to 
be general with no suagestion of specific tasks although it 
was recognized that many uses and data rates may be 
u t i 1 i zed . 

Chapter II discusses specific comouter manufacturers 
and computer types. Chapters III# IV and V deal with the 
two most popular gene r a 1 -pu roose array processors on the 
market# discussing the pros and cons of each. Chapter VI 
Gives final conclusions ana recommenda t i ons concerning the 



oroDosed test bed 



I 



I I . COMPUTERS 

A. GENERAL 

For the test bed evaluation/ choosing the proper 
computer is important since a varying amount of 
computational power is required for each subsystem. Also/ a 
gambit of functions ana uses may be tried necessitating a 
system that must realistically emulate many speed/ cost and 
memory constraints. A common and popular system affords 
better software support while still maintaining a low price. 
The ability to rely on system support is an important issue 
when consioering long term use. A popular system tends to 
develop newer/ more efficient software packages earlier and 
more frequently than go less used systems. 

For large array processing applications with many 
display devices the ideal situation would be for one 
computer to initially load the array processor and then act 
as a “whole system” monitor and statistician. It could also 
perform the information gathering function while another 
computer would act as the output processor for the array 
processor ana control the oi splay devices. That situation 
would be similiar to that of a test bed where flexibility 
may be the key and being computer-bound would be hiahly 
undesirable and possioly unjustly influence the evaluation 
of the array processor. An ultimate goal might to be to 



choose the smallest computer capable of operating the array 



processor and associatea 



G i sp 1 ay devices 



in the desired 



fashion 


w h i 1 e 


providing 


for 


product 


expansion. It 


i s 


real i zed 


that 


for 


test 


and 


research 


activities 


more 


comPut i ng 


power 


may 


be 


necessary than 


would be needed 


for 



norma) production activities. 

In October 75/ the Computer Family Architecture 
Selection Committee was formed to evaluate computer 
architecture canaidates as a basis for a family of 
software-compatible military computers. Ten Army and 17 
Navy oraanizations were represented on the selection 

committee [111. The purpose was to select an architecture 
which could be used as a standard/ had a proven instruction 
set and an architecture which could be used in advanced 
technoloqies. 

B. PDP-11 FAMILY 



The 


C omm i t t ee 


voted 


that 


the POP-11 had 


the best 


architecture for 


use in 


the 


Military Computer 


F a m i 1 y . 


Howeve r > 


it aene r a 1 


1 y c on t a 


1 ned 


a small address space ana 


possible 


floating 


Point i 


nstruction compatability 


p rob 1 em s 



with existinq systems. The IBM system 370 was ranked second 
with the Interdata 8/32 ranked third [121. The Digital 
Equipment Corporation PDP-11 series provided a popular 
example of Doth the price and performance excellance in 
available computer systems. Their popularity is evidenced by 
the shipping of 10,000 POP-ll/Oa and 10,000 PDP-ll/3a 



computers as of 1975 



1976 respectively t26J 



re 1 evan t 



PDP“11 computers considered were the POP-ll/O^l, PDP-ll/3il/ 
PDP-ll/a5» PDP-11/55, PDP-11/60, and the PDP-11/70 (listed 
from least powerful to most powerful). What follows is a 
brief descriotion of each system. Unless otherwise stated, 
it will be assumed that the more powerful system will 
contain all the features of systems less powerful. The 
PDP-11/03 and the LSI-11 series were not considered due to 
their not having the advantaoes of the UNI0US C28J. 

1 . POP-1 1/Od 

The PDP-11/04 is the smallest computer- of the PDP-11 
series, containing the entire central orocessing unit on one 
board permitting room for drastic expansion due to unused 
chassis area. The system contains self-test logic to 
determine system ooerability every time the orocessor has 
Dower applied/ the console emulator is used or the bootstrap 
routines are initiated. The console emulator allows the 
operator to control the system from a terminal without 
Dhysically throwing switches or reading lights on the front 
panel of the unit. The bootstrap loader automatically 
restarts the system from various peripheral devices without 
need of ohysical switch throwing. Memory size varies from 
8K bytes to 5bK bytes (8 bits = 1 byte) of either MOS 
(metal ic oxide semiconductor) or core type i^ith an average 
access time of 500-nanoseconds and system cycle time of 
7 25 -n ano s ec on ds [301. ^ typical cost of this system is 
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PUP- 1 1 /3a 



The PDP-ll/3a is the next size of the PDP-11 family 
and is the lowest architecture to contain a memory 
management routine to orovide program protection so user 
programs cannot access or change system memory space. (In 



the 


1 l/ua i t i 


s the 


orogrammers responsibility 


to maintain 


and 


protect 


this 


area. ) 


Memory management 


also allows 


virtual memory 


paging of uc 


to 16 pages ranging 


in size from 



6a bytes to 8K bytes for a total possible memory of 256K 
bytes of which 128K is physical. (The highest aK of address 
space on the POP- 1 1 / 3a /aS/55/6 0 / 7 0 is used for registers 
that store I/O data or status of indivioual peripheral 
devices. This means that the ll/3a can physically address 
12aK bytes but virtually aadress 256K bytes.) The ll/3a 
allows both core memory and memory to be used 
cone u r ren t 1 y . 

The PDP-ll/3a also contains a memory option called 
cache memory which is a 2K high speed ( 300-nanosecong cycle 
time) memory used to store a copy of the the most recently 
selected portions of main memory affording faster access of 
instructions and data. The "hit” time or time the next 

i 

access is resident in cache is app r o x i m a t e 1 y 8 p percent for 
the 11/3^. Time is saved by less area to access/ therefore 
less search time/ and shorter less complicated data 

i s 



t ransm i ss i on 



Since M 0 S 



memory 



volatile 



(loses 



information when oower is removed)^ the 11/3^ has a battery 
back-up ODtion which will retain information in the MOS 
memory for a po r o x i m a t e 1 y two hours. The PDP-ll/34 can 
operate in two modes^ Kernel and User. This two mode 
concept is important in security since the User mode is 
prevented from executing certain instructions that could 
cause modification of the Kernel program^ halt the computer 
or use memory soace assigned to the Kernel or other users. 



Moni tori ng 


ana 


Supervisory routines are 


executed 


i n 


the 


Kernel mode • 


The Kernel /User concept is 


i moo r t an t 


since 


i f 


the Kernel 


can 


be made secure# the overall 


s e c u r i t y 


o f 


the 


Operat i ng 


system from accidental harm 


is much 


easier 


t o 



achieve. Prices range from 311^080 to S53/800 (29] . 



3. PDP-ll/45 



The POP-ll/45 system is designed for soeed. The 
high-soeed central processor allows program execution of 
three million instructions per second ana has either 
300-nanosecond bioolar memory or 980-nanosecona core memory 
available. HOS memory is also available as an '’add-on” 
option. Total memory soace is the same as the 11/34, There 
is an optional float inn point orocessor to handle double 
precision arithmetic. The system is especially good for 
multi ole-task apoli cat ions# otherwise it is the same as the 



11/34. The price is S41,800 129] 



a 



PDP-l 1/55 



The PDP-11/55 system imoroves on the 11/45 by 
insertina a dual bus structure to allow intermixing core and 
bipolar memory (ud to ^43K with memory management) to 
optimize system oerformance. Two separate semiconductor 
controllers allow simultaneous data transfer for increased 
system tnroughnut* Both the 11/45 ana 11/55 hardware have 
been optimized towards a mu 1 t i p r og r amm i ng environment by 
installing a tnird mode# Supervisor^ to control system 
operation while oroPerly handling multi-user operations 
[303. The price is S50f400 to i 80^780 [29J. 

5. PDP-l 1/60 . 

The PDP-11/60 system is the interface between the 
mid*range nriini and the more powerful mini. i/'iith the 11/60 
we see the first capability to microprogram and four levels 
of priority interrupts. The system was also designeo with 
the engineering traae-off Pet ween ease of maintenance and 
reliability in mind. A system that is very difficult to 
reoair after failure may oe less useful than an easy system 
to repair that fails more often. The availability of the 
system is a measure of mean time between failure divided by 
the quantity mean time between failure plus mean time to 
repair (MT8F /(MTBF t ^^TTR)1 [ 30 3 . Digital Equipment 
Corporation has tried to allow for a more complex 
architecture (probable higher failure rate) by providing a 
Reliability ana Maintenance Proaram (RAMP) sottware package 
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to helD locate software ano haroware errors^ decreasing the 
MTTR thereby increasing availability. The price ranges from 

to over S200/000. 

6. PDP-1 1/70 

The PDP-11/70 is the largest of the PDP-11 series 
and gives the power of a large computer at the cost (J63/000 
to 314^/880 tc^9j) of a minicomputer. It was designea to 
operate in h i gh -pe r f o r mane e systems and is iaeally suited 
for real-time systems due to the high speed of execution and 
the 80-95 oercent ’*hit'* ratio of cache memory. Auaressinq of 
over four Megabytes of physical memory is theoretically 
possible with the 2d bit addresser/ although 85oK of this 4M 
must be used for the UNIBUS referencing. (The UNIBUS can 
only address 18 bitS/ therefore the memory management 
routine must convert the 4 ^-egapyte addresses as if it were 
a virtual location.) At the present time however only 2M of 
physical memory can actually be accommotated Py the UNIBUS. 
There is the option to use 64 pit floating point numbers in 
calculations. /nth two megabytes of main memory there is 
little concern for memory constraints during a multi-tas< 
environment* The option of attaching high speed mass 
storage devices to the central processing unit through 
dedicated paths is available. The system has eight levels 
of priority and a large amount of flexibility in its 
programming making it possible to run several levels of 
display devices under varying loading conditions. 
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ARRAY PROCESSOR 



III. 



An Array Processor is an unit capable of performing 
floating point operations on large data arrays or data 
streams. It usually operates as a peripheral device to a 
'*host^ computer system and Pest performs the repetitious 
reiterative operations requiring a large number of 
summations and multiplications tyoically encountered in 
matrix calculations such as correlations and fast fourier 
transforms. This system is special purpose ana cannot 
’* think’* for itself since it has no executive functions 
except those necessary to control the mathematics required 
to perform additions/ multiplications and data movement 
[161 . 



i/N i t h an array processor/ large transforms can be 
achievea dependent only on memory capacity. These 
transforms can be done faster than in the normal CPU since 
the array processor performs only one function at a time 
(here function is used in the broader sense as in 
transposition) and there is no need for the normal overhead 
control logic of a general purpose computer (8). This is 
more advantageous than a special purpose computer in that an 
array processor can be programmed to execute various array 
processing applications and can also act as a peripheral. 
Ideally a system would be wanted that could handle any size 
arrays including the possibility of very large arrays if the 



situation warranted. Fnis is theoretically oossible by using 



sequential processing anc stringing a series of array 
processors together having each oerform a specific 
operation. That woulo only be aood/ however/ for 
applications not neeaing results of data processea in step N 
to be usea in step N-1. Using one array processor/ 
efficient and sufficient performance of large arrays is 
possible due to the soecial architecture and memory of the 
ar ray orocessor . 



Two general purpose array orocessors oresently seem to 
dominate the market. These are the CSP Inc. MAP-300 (Macro 
Array Processor) ana the Floating Point Systems AP-l^OB. 
While the basic function of each is similiar/ the actual 
operation is Quite different. 



The theoretical advantaae/disaavantage of each 
processor will be aiscussed in detail comparing 
architecture/ operational cnaracteristics/ software support 
ana proaramaoilitv. Cnapter* VII/ Conclusions ana 
Pecommendations/ will ciscuss the actual oroblems 
encoun^erea with the installation of the F"AP-30o system to 
be used in the evaluation here at the f)aval Postqraauate 



School 



I V . THL AP-120B ARKAY PPQCESSOR 

The AP-120B Array Processor (fig 5) is manufacturea by 
Floating Point Systems Inc*^ Portland# Oregon. It operates 
synchronously using a lo7-nanosecond cycle time master clock 
synchronized with a 50 percent safety margin every cycle for 
worst-case temoerature and voltage. The system uses ore- 
conditioned meaium-scale integrated circuitry# large-scale 
integrated circuitry and t r an s i s t o r - t o - t r a n s i s t o r logic. 
The AP-120B is caoable of ooerating in temperatures from 10 
to ^0 degrees centigrade at 0 to 90 oercent relative 
humidity. This processor is also able to operate using one 
of these various power options? 105/125 VAC at 120 amps# 
160/22B VAC at 10 amps or 210/250 VAC at 10 amps with eitner 
50/o0 hertz or 50/<400 hertz available [71. 



The AP-120B employs a technique known as pipeline 
processing to increase throughput. Pipeline processing 
utilizes a combination of tne elements of both secuential 
processing and parallel processing. A single basic 
Processor# like an adder# is logically divided into integral 
units that can each perform a specific and separable 
function while another unit of the adder simultaneously 
Performs another function of the addition task. 'A hen one 
task IS completed# it will move on to the next step in the 
sequence allowing the Just vacated section of the aader to 
be filled with the next task in the queue. Tnroughput is 
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General AP-120B Block Diagram 



Figure 5 
31 



increased by insuring that the entire systenn is always full. 
This technique works with Doth the adder and the multiplier 
in the AP-120B. Pipelining is good for vector operations 
since vectors are Oasically independent ana a solution of 
vector N is not needed before vector N+l can be started. 
However scalar coerations are basically seauential 
operations ano cannot make use of oipelininq [11. By 
carefully considering every operation^ especially those in 
looPSf the programmer can squeeze more operations per time 
interval by pioelining than would be possible using standara 
seauential techniaues. The time is generally limiteo by the 
multiplication time (W4]. 

The AP-120b instruction word is up to 6^-bits long ana 
can perform a maximum of ten different operations in a 
single cycle. As an example^ an add/ a multiply/ a move to 
and from each data paa (there are two) and an a 6ar ess 
increment or decrement can all be performed in the same 
cycle. Any one instruction or combination of the above can 
be performed as long as the resource required is not being 
useo in another operation (some operations are multi-cycle 
and "lock-out” the resource until they are complete). It is 
the programmers oPliaation to insure that all required 
resources are available when they are requested or else they 
will be lost [7]. As an example/ a reao from a data pad 
takes at least two cycles. If cycle N wanted to read from 
Data Pad X and cycle N-1 already initiated a read from Data 
Pad X/ the entire instruction word for cycle would be 
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delayed one cycle waiting for the resource to become 
available. This ability to perform more that one basic 
operation per cycle allows a theoretical 30 million 
instructions oer second to be executed. Due to memory size 
limitations and algorithms not needing ten operations per 
instruction word for sustained periods this rate can never 
be fully attained except possibly for short bursts [3b3. 
Since some of these operations are house^<eeping functionSr 
the maximum number of arithmetic operations per second 
theoretically possible is twelve million for vectors and 
five million for scalars (scalar speed is much lower since 
it reouires sequential processing and cannot tal<e advantaoe 
of p i p e 1 i n i n g J 111. 

The AP-120B uses a 38-bit data word which Floating 
Point Systems contends generates better accuracy than the 
32-oit word commonly used by other systems [7] . This 38-bit 
woro consists of a ten-pit biased exponent and 28-bit twos 
compliment mantissa thereby allowing numoers in a range of 
3.7 * 10 ** -155 to b.7 * 10 * 153 to be represented. The 
2b-bit mantissa allows for extensive calculations without 
significant truncation errors or a maximum relative error of 
approximately 7.5 ^ 10 ** -9 ner arithmetic operation or 
about 8 decimal digit accuracy. Floating Point Systems Inc. 
also employes a techniaue known as convergent rounding which 
they assert forces the roundoff error to approach zero. 
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The AP-l^OB aoes not contain the normal bus structure 



of other array processors but insteaa uses dedicated 38-bit 
data paths for the movement of data. There are two paths 
available to the adder (one for each input register)/ two 
oaths to the multiplier anc three paths available to the 
memory and data pads. This allows seven independent data 
woras to De transferred each cycle, (This coupled with an 
aodr multiply and address increment/decrement/ eauals the 
ten instructions oer cycle possible,) These separate data 
paths eliminate the neeo for a handshaking arrangement 
between logic elements/ although hankshakinq is reauired 
when the AP-l^OB communicates with the host [7/iol, 

The price of a unit which includes the AP-1208 array 
processor/ interface with the PDP-11/ IbK words of 
333-nanosecond interleaved OS memory/ expansion chassis/ 
installation/ 25o words of program source memory/ 512 words 
of Pead Only Memory (ROM) table memory/ a linker/ loader/ 
simulator/ debugger/ algorithm library and executive is 
350/R70.00 [10]. This includes a 90-day warranty with a 
servicina agreement availaole at extra cost. The field test 
mean time between failure is 3500 hours (31, 

The following section explains the hardware of the 
AP- 1 20B in de tail* 



A. CHARACTtRIS r ICS AND HARDWARE 



1 



Multiplier 



The Multiplier unit (fig 6) consists of two 38-oit 
multiplier registers Ml and M2/ three multiplication stages 
and a 38-bit register to store the result To receive 
a resultant after initiating the multioly/ three cycles or 
500-nanoseconds are requirea. Inputs to the Ml register can 
come from Data Pad X (OPX), Data Pad Y (DPY), Table Memory 
(TM) or the Multiplier result register (FM), Inputs to M2 
are either from DPX, OP i , ^ Oder result register (FA) or Main 
Data Memory Output Buffer (MD), Results from the multiplier 
can go to Ml/ the Adder incut register (Al)/ Main Data 
Memory incut buffer DPX or DPY, 

Stage one of the multiplier starts the oroduct of 
fractions ny beginning the multiplication of the t^^o 28-bit 
mantissas. This multiolication is completed in stage two 
resulting in a Sb-oit mantissa. Stage three adds the 
exponents as it normalizes and converqently rounds the 
5b-bit mantissa to 28-bits. This stage also detects 
exponent o ve r f 1 ow/unde r f 1 ow and if either exist will set the 
FO of FU bit in the status register. The status register 
can be read by the program to determine; if conditions are 
met from an arithmetic oceration/ to specify errors/ or to 
be used in branching logic. These bits are available for 
testing one cycle after completion of the multiply. 

This three stage multiply allows oipelining to be 
used since each stage is independent of the other two which 
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Derm Its a multiplication result to be oresent at the result 



register every 167-nanoseconds once the pipeline becomes 
full (three cycles reouired to fill). Note that 
500-nanoseconds are recuired if the result of the 
multiplication is requirea in the next multiplication as is 
the case with scalar arithmetic. 

A readily apparent problem with the multiolier is 
that Ml receives inputs from both the Table Memory (TM) and 
the Multiplier Result register (FM) while ^^2 receives inputs 
from neither. Therefore/ if a constant from TM were to be 
multiplied by the result of a just-completed multiolication; 
it would require an extra two cycles since either FNt or TM 
would first have to be written into DPx or DRY and then 
written into M2. This disadvantage is overshadowed by the 
fact that even though dedicated data lines cause the above 
problem/ in most cases they present a distinct advantage by 
allowing multiple data transfers in any given cycle [323. 

2 . Adder 

The operation of the adder (fio 7) is similiar to 
that of the multiplier and consists of two 36-bit acaer 
registers A1 and A2/ two adder stages and an adder result 
register (FA). The ^addition of two numbers requires 
333-nanoseconds (two cycles). Incuts to A1 are from Tacle 
Memory (TM)^ Multiplier Output register (F'^M/ Data Pad X 
(DPX), Data Pad Y (DPY) anc the ZERO constant while inputs 
to A2 are from the Adder Output register (FA), Data Paa X 
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(DPX), Data Pad V (DPY) ana the ZERO constant. The results 
fronn the adder can qo to M2, DPX, DPY or MI, Stage one 
aligns the mantissas by shiftinq the smaller valuer based on 
the value of the exponent/ to the right until both exponents 
are equal then adciinq or subtracting these mantissas. Stage 
two normalizes ana converqently rounds the mantissa ana 
adjusts the exponent. This stage also sets four bits in the 
status register to denote results equal zero (FZ)/ results 
less than zero (FL)/ exponent overflow (FO) or exponent 
underflow (FU), These oits may be tested by other program 
instructions one cycle after the addition is completed, 
(Note that FO and FU are the same bits that are set by the 
multiplier on exponent overflow or underflow,) 

As with the multiplier/ the two-staqe aader allows 
pipelining ana a result can be generated every 
167-nanoseconds, The adder dees not have the disadvantage 
of inouttino Table Memory (TM) values at the same register 
as FA but does have the multiplier result FM at the same 
adder input register (A2) as TM values. There is therefore 
not the ability to immediately add a FM value with a TM 
value witnout first going through DPX or DPY [32], 
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1 cycle). Otherwise there would be no loss of time since 
steps could be taKen to move the value in FM through the DPX 
or DPY which would make it be available at the 

adder/multiolier input register when necessary. 

(Presupposing of course that the data paths to or from 
memory were not needed for other uses.) 

3. S^Pad 

The S-Pad (fig 8) (pseudonym for scratch oad) 
consists of the S-Pad Memory/ S-Pad Arithmetic Logical Unit 
(ALU)/ Data Pad Aadress Register (DPA)/ N^emory Aadress 
Register (MA) and the table Morrory Address Reqister(TMA)* 
The sole purpose of the S-Fad is to compute addresses for 
Table Memory/ ^ain Data Memory and the Data Pads. The S-Pad 
can operate concurrently with the memories/ Multiplier and 
Adder (71. 

The S^Pad ^-^emory is made uo of lo registers each lb 
bits wide giving the ability to compute an effective address 
of b^K. These registers may be assianed label names like 
pointer” Oy the use of cseudo-ooerators/ to mafce programs 
more readable/ or may be airectly addressee by number. 

The S-Pad Arithmetic Logical Unit forms the operand 
addresses and also automatically looo counts/ shifts the 
addresses left once (divide by two)/ shifts the addresses 
right once (multiply by two) or right twice (multiply by 
four). There is also the ability/ if reauireo/ of bit 
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reversal/ to swap bits while accessing data in a scrambled 



order after a Fast Fourier Transform# The results of the 
S-Pad arithmetic logic unit/ called SPFN/ set bits in the 
status register to indicate whether the results were less 
than zero (N)/ zero CZ) or if there was a carry bit (C)# 
These bits are available for testing by Program instructions 
at the next instruction cycle. 

Ti^A/ DPA and MA store tne comp tj ted address from the 
S-Pad ALU. The contents of each can either be changed by 
the value of SPFN or incremented by one* One cycle is 
reauired to compute the address and load it into the oroper 
register . 

^# Table Memory 

Table memory is a 512 word/ 38-bits per word bipolar 
read-only memory used to store important and much used 
constants# This memory has a lb7-nanosecond cycle time out 
reauires two cvcles to get the value from memory to the 
output register T'^l [7]. values in [M are available for use 
by DPX/ OPY/ MD/ Ml and Al. These values may be requested 
every machine cycle and are initiated by changing the 
contents of the Table Memory Address Register (TMA) in the 
S-Pad# The programmer must control the timing necessary to 
insure the correct constant is at TM when needed due to the 
2 cycle access time reauirement# 



In the Fast Fourier Transform ^ o 6 e r the address in 
TMA is interpretted by the hardware to be the angle which 
Doints to the aporopriate root of unity for a particular 
step in the FFT aloarithm. Therefore^ in a single auaarant 
of cosines^ a full taole can be represented [32). 



There is an optional Random Access Table Memory 
(TMRAM) containing IK of random access memory [81. This 
allows loading of special constants necessary for special 
applications without the overhead of computing them every 
time or usina valuable data Pad space to store them. The 
price of this option is approximately 51850.00 [71. 



5. Data Pad X ana Y 

The Data Pads (fig R) consist of sixty four 38-bit 
acc umu 1 a t o r s ^ four of which are available from the lb 
addressable each instruction cycle [7J. Tnese 
accumulators are dividea into two 32-register blcc<s called 
Data Pad X (DPX) and Data Pad Y (DPY). From each Data Paa# 
one reoister can be reaa ana another written aurina the same 
cycle. 



The restrictions are that the same reoister cannot 
be read and written simultaneously and that a read ana write 
operation during the same cycle must occur on registers 
whose addresses differ by no more than 7 aue to base- 
address-p 1 us-o f f se t addressing. (However a register in DPX 
may be written at the same time as a register in DPY even if 
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they both have the same address.) In the S-Pad/ the Data Pad 
Address Register (DPA) suoolies the base address to be used 
by the read/write instruction to locate the oroper Data Pad 
register. The DPA supplies both DPX and DPY concurrently. 
The instruction uses this base address and an offset in the 
form DPX(offset) or DPY (offset) and can address to +3 
offset from the base in each Data Pad to find the effective 
address. Therefore if the DPA contains decimal value 20/ 
reaisters lb, 17/ 18/ 19/ 20 , 21/ 22 and 23 can be aoaressed 
in eacn data cad. The register addresses of both Data Paas 
range from 0 to 37 (base 8) and are arranged in a circular 
aaoressing scheme. Therefore 37 (base 8 ) + 1 = 0 and the 
programmer need not be concerned about writing into a non- 
existant location but must only be concerned with 
overwriting previously written information. 

DPX and DPY receive information from MD/ FA/ FM, 
DPX/ DPY/ output of the S-Pad arithmetic logical unit (SPFN) 
and VALUE (an immediate value used by immediate instructions 
arriving from the command buffer). DPX and DPY suooly 
values to Ml/ M^/ AU A2/ DPX/ DPY and MI [321. 

6. Main Data Memory 

Main Data Memory (fig 10) contains 6^K 38-bit words 
used primarily to store inputted data which will be operated 
on Dy the orogram. This memory is available in two forms/ 
lb7-nanosecond hardware interleaved MOS with wora 



segments or 333-nanosecond hardware interleaved *^03 with 8K 
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wo ro segmen t s . 



Both memories have a two bit parity option 



available [7] and a one megaword cage selection option [9]. 
With memory limited to 6<4K/ the largest c omp 1 e x - 1 o*c omp 1 e x 
Fast Fourier Transform possible is 32K, which may not be 
acceptable in some apolications. 

Main Data Memory receives input information into its 
Memory Input Buffer (MI) from FA, FM, MD, DPX, DPY, TM, SPFN 
and VALUE. It can output via the Memory Data Buffer to DPX, 
DRY. A2 and M2. 

Memory read or write may be requested every other 
cycle by chanaina the value of the Memory An cress Register 
(NiA) in tne S-Pad. This yields an effective memory cycle 
time of either 333-nanoseconas ( 1 o7-nanoseconds plus one 
macnine cycle) or 500-nanoseconds (333 plus one machine 
cycle) dependent on the tyoe of memory installed [3^]. By 
special programming tecnnicues and orocer chip procurements 
this overhead can be reduced to the advertised memory speed 
with the restrictions that the memory alternate between 
Chips or alternate between even ana odd boundaries. If 
effective speed is essential/ it oecomes the programmers 
responsibility to insure data location is known to the 
program at all timeslBl. A read reauires three cycles for 
information to be present in the MD if using 333-nanoseconc 
memory and two cycles if using 167-nanosecond memory. This 
information will be available until a new value overwrites 
it. If a write or read is initiated before two memory 



cycles (unless soecial chics and techniaues of above 



are 



used)/ the request will not be lost but the memory will 
automatically provide a hardware lockout (wait until memory 
available for read/ write) [1^]. 

The value in the Memory Address Register (MA) points 
to the desired location in main data memory, MA may be 
either set to a soecific value or incremented/ decremented by 
one in the S-Pad, Since there is a slight time lag between 
when a value is requested to be placed in ana when it 
actually gets there^ the crogrammer must always be aware of 
what values are in MI and MO# to allow the proper '*set up'* 
time to aet these values to either the Agder/ Multiplier or 
correct DPX/ OPY or MI address [321, 

7. Program. Source Module 

The Program Source Module (fig 11) consists of the 
Program Source Memory (PS)/ Proqram Source Address Register 
(PSA)/ Control Buffer (CB) ana the Subroutine Return Stack 
(SR5) (32) . 

The PS is a nigh speed/ 50-nanosecond# bipolar 
memory aadressable to 7K 6^-oit words and is available in 
256 wora increments [^1. The PSA contains the address of 
the next instruction ana is incremented by one after 
instruction execution unless modified by either the Control 
Buffer (new aadress as a result of a branch or jump 
instruction) or the Subroutine Return Stack, The SRS saves 



as 



the current PSA when a Jump Subroutine instruction is 
performed and increments the value of the Subroutine Return 
Address (SRA). l\hen a Return instruction is performed^ the 
SHA is decremented by one ma’<ing nested subroutines 
possible. The Control Buffer decodes and executes the 
instruction as the CPU would in a general ouroose computer 

V 

[32J . 



8. Interface witn PDP-11 Series 



The interface unit with the PDP-11 series contains 
two major segments^ the Front Panel and the DMA Controller 
and Formatter. The Front Panel contains three registers and 
is used mainly as a debucgina aid while the DMA Controller 
and Formatter contains five registers and is used for 
program and data entry or removal. 



a. Front Panel 

The Front Panel (fig 12) consists of three 
Ib-oit registers/ the Switch Register (S/;R)/ the Liahts 
Register (LITES) and the Function Register (Ft\). The Front 
Panel is used for boo t s t r aco i nq ano debugging of user 
programs. These three registers can be examined oy the host 
and take the place of the toggle switches normally on the 
front panel of the console [32). ^ith the use of the 
Debugger proaram/ these registers can effectively breakpoint 
the AP-120B at a selected crcqram location or data address. 
This Front Pane) allows each program to be single-steoped 
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through its execution sequence [6/7]. 



The Switch Register is written by the host 
computer but can be read by both the AP-120B or the host. 
The S';jR is used to enter data and addresses into the 
AP-120B/ primarily for debugging. Its contents can be fed 
to the DPY/ MD or the S-Pad. 

The Lights Register simulates the front panel 
lights of the console. This reaister is set by the AP-120B 
and can only be reaa ov the host. LITES is used to display 
selectea contents of the internal registers of the AP-120b. 

The final register is the Function Register 
which provides front canel togqle-li<e controls to the 
AP-120B. The Fi'J can stop/ start/ steo or reset the AP-120B. 
It can also continue operation res u mine at the current value 
of the PSA, examine a register/ examine a portion of a 
register or memory contents of a selectea area/ deposit the 
contents of SlNp into a selected reaister or memory location 
ana then breaxpoint according to the values of TMA/ ^lA or 
DPA. The FiJ can also increment the TMA/ MA or DP A after 
completion of an instruction to facilitate stepping through 
memory locations (321 . 

The Front Panel is advertisea to be invaluable 
in troubleshooting when used in conjunction with the 
interactive Debugger routine. 
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b. DMA Control 

The DMA Control is the second half of the 
interface ano consists of three lb-bit registers^ one 18-bit 
register and one 38-bit register. DMA Control is 
responsible for transferring programs and data between the 
AP-120B and the host comouter. This section of the Front 
Panel will also do format conversion ”on the fly" which 
should effectively alleviate time lags [32]. Four types of 
data transfer combinations are possible^ host D-^A to AP-120B 
DMA/ host DMA to Ap-130b Programmea 1/0/ host Programmed I/O 
to AP-120B Programmed I/O and host Pro a rammed I/O to AP-120C 
DMA with a maximum theoretical burst transfer rate of three 
mega words per second for all tyoes of transfers [73* 

The Format Register (FMT) is a 3 8 -bit double- 
buffered register used to perform all transfers of 
floating-point numbers from the host to the AP-120B 1323. 
The FMT will convert 16-bit integer numbers to 3b-bit 
unnormalized f 1 oa t i na -po i n t numbers^ 32-bit PDP-11 integers 
to 32-bit AP-120B inteaers and 32-bit f 1 oa t i ng-po i n t numbers 
to 38-bit f 1 oa t i ng-po 1 n t numbers. All these operations are 
in reverse for the AP-120B to host direction [73. Since the 
PDP-11 is a 16-bit computer/ it will access tne Formatter in 
lo-bit half-words to be compatible. It must be notea that 
for Some applications/ such as difference filtering/ there 
is a possiblity of extreme accuracy loss due to lb-bit 
integer to 38-bit floating-point conversion. The synthetic 
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precision aenerated by such a conversion can cause certain 



coeffiecient comb i na t i on s f such as and -1/ when 
multiplied by mirrored arrays^ to result in errors when 
reconverted to lb-bit format. The orogrammer must be aware 
of these possible losses and test for them before faith is 
placed in tne result. 

The AP Direct ^^emory Address Register (APDHA) 
points to consecutive locations in AP-120B Main Data Memory 
during DMA transfers. This register can be automatically 
incremented/decremented allowing bloc<s of information to be 
read into consecutive locations with minimal overhead. 

The Host Memory Access Register (HMAJ operates 
si miliar to the APDMA except it ooints to consecutive memory 
locations in the host memory. In the PDP-11 this memory is 
256K sc the HMA is lb-bits to allow for this addressing 
c a D 3 b i 1 i t y . 



The '^^ord Count Register (.^JC) counts the numoer 
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The final and most inportant register in the 
interface is the Control Register (CTL). It controls the 
direction ana mode of transfer^ type of format conversion 
and provides certain status bits pertaining to the transfer. 



This register^ with the use of HMA and/or APDMA^ allows the 
host to execute other nrcgrams and be interrupted when the 
DMA is comoleted. This CTL also allows either the host or 
AP-120B to control the data transfer. (The AP-1208 must 
control transfer from a loaded procram since the executive 



alone is not 


powerful enough to 


c on t r o 1 


data t rans f er 


f32] . ) 
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1. Executive ana Associated Routines 

The AP-120B orovides executive and house keeoing 
routines to increase the effectiveness of operation and 
enhance program development. 

a. APMATh 

APMATH is a series of approximately 150 [8] 
library functions# vector and matrix subroutines and signal 
Processing algorithms [7] written in AP-120B assembly 
language [81. These routines are callable from either host 
Fortran# host Assembly or AP assembly lanauages [36] with 
the use of the AP Executive. These programs can reduce the 
run time and decrease programmina time by presenting some of 
the most common array processing functions in subroutine 
callaole form. These routines include: data transfer ana 
control? oasic vector arithmetic? matrix operations and Fast 
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Fourier Transform; all of which are able to work with both 



real ana complex data. 

b. APEX 

APEX is the AP Executive routine >/vhich is 
resident in the host ccmouter ana allows the AP-1208 to 
communicate with the host comouter via Fortran or host 
Assembly language calls. APEX decodes subroutine calls from 
the host computer [36J and directs the AP-l?0B to perform 
the specified action. Both APMATKi routines and user -written 
routines may oe called by the AP-120B from the host computer 
112] . 

c. APAL 

The AP Cross Assembler (APAL) is a two pass 
assemoler written in Fortran IV which reauires memory in 
the host computer to operate. APAL assembles source text 
written in AP Assembly lanauage into coject code 
understand a ole by the AP-1208. The assembler also 
optionall/ Produces an AP Assemoly listing containing errors 
in ooth passes^ location counters/ assemblea data/ the 
symDol taole and source statements. 

APAL recognizes signed constants ranging from 
-32768 to 32767 and unsigned constants from 0 to 65535 both 
of which may be represented in binary/ octal (default base)/ 
decimal or hexadecimal. It allows free formatting but 
recognizes the general source statement form: optional 
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label followed by a colon/ multiole op codes separated by 



semicolons (one to ten operations which total no more than 
6^-bits. Sixty four-bits is the maximum dictated by seven 
data transfers/ one adO/ one multi oly and one address 
increment/decrement)/ and an optional comment statement 
denoted with leading double quote ( ** ) . 

Once the modules are written/ ARAL can be 
operated dynamically/ allowing the programmer to build the 
program at assembly time. ARAL will question the operator 
about the source file name/ destination file name etc. ana 
subseciuently ^ill prompt him concernina missing items. If 
there are errors in the module/ these can be changed 
dynamically without reassembling the entire module (^]. 

a. APLINK 

The AP linker (APLIiNK) is written in Fortran IV 
and requires aporoximately lOK of memory in the host 
computer. APLINK performs functions si miliar to those of 
any other lin< editor which include relocation and assigning 
absolute addresses to the object module/ correlation of 
qloDal entry symools in one module with external symbols in 
the other modules/ loadinc the module from the program 
library and oroduction of the final load module. These 
functions are performed i n t erac t i ve 1 y with dialogue between 
APLINK and the user at the console. 
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Besides linkinc the modules# APLINK returns to 
the console any symbols in a file which are undefined# will 
output the symbol table and locations when requested ana 
returns the high address and starting address to be used 
w i t the Deougaer routine [51. 

e, APSIM 

APSIM is the AF-l^OB simulator and is designea 
to be used when aevelooinc oro grams when use of the AP-120B 
is imoractical or impossible due to oroduction schedules* 
APSIM emulates all haraware and timing characteristics of 
the AP-1208 as well as performing the mathematical routines 
as closely as possible to the way the AP-120B woula perform 
them (3^1. APSIM requires 32K words of memory in the host 
computer (11 . 



f. APDERUG 

APDEHUG is the AP-120B interactive aebugger 
program to be used for dynamic debugging of AP-1206 
aoplications programs at run time. Changes can be made when 
the proolem is identified and APDEBUG will call the APLINK 
and APAL routines to insert the new object module then 
continue with oroaram develooment. APDEPUG can work in 
conjunction with the simulator or the actual hardware Co] • 



Q. Testing Software 
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There are three software modules available to 



completely test the AP-120B hardware operations. 

APTEST IS the AP-120B path tester. This 
software exercises the panels DMA interface^ internal 
registers ana memory to check for proper operation. 

APPATH tests the internal data paths of the 
AP-1208 and returns diagnostics upon finding any errors. 

Forward/Inverse Fast Fourier Transform Test 
(FIFFT) verifies correct operation of the AP-120B*s 
arithmetic units by performing Fast Fourier Transforms ana 
inverses them com oaring results with standard answers li23. 

These oackages can Pe used to help insure proper 
operation of the AP-120B before development or actual 
operation and also help with the hardware fault locating 
effort during system maintenance. 

2 . Programming Lanqu a a e 

The “^ath Library of AP functions can be called by 
the host Assembly Language/ Fortran or the AP Assembly 
Language [3o3. However to ^rite a custom library function/ 
AP Assembly Language must be used and the c r o s s -as semb 1 e r 
will translate it into an executable routine. 

Investigating the programming language is not 
important here except to say that it is simili ar in 
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characteristics to other assembly languages. There are 
sufficient commands available to write a program to properly 
control AP-120B execution in an efficient manner. Bit 
testing/ conditional branching/ flag settina and arithmetic 
instructions all are part of the instruction repertoire 
which allows varied aoDlications programs to be written. 

3 . Page Select Option 

The AP-120B can a 1 t e rna t i ve 1 y be eauipoed with a 
Page Select Ootion. This orovioes the aoility to address 
one megawora of main memory in the AP-120B by using host 
main memory and virtual memory techniques. Each page can be 
up to 6^K woras long (full Main Data Memory size but each 
page must be at least 8K) and lb pages are available. The 
Page Select Option increases the ability for the AP-120B to 
work on larger transforms/ but due to paging overhead/ it 
may not increase the throughout rate due to increased host 
invol vement . 

This option modifies the AP Direct Memory Aogress 
Register (APOMA) located in the DMA Control section of the 
interface by extending it from 16 to 20 bits therefore 2’^*20 
addressing caoability (approximately one megaword). This 
virtual memory ability is called the AP Memory Address 
Extension (APMAE) and new addresses can only be loaded by 
the host. Since the host will control all paging 
operations/ the AP-120B commands will not change inasmuch as 



it will only recognize 6^K /^ord locations [9]. 



Proqrammable I/O Procesor 



The P rog rarpmab 1 e I/O Processor (PIOP) is a micro- 
codable micro-processor wnich acts like a high speed channel 
program control! i no an input/output port. It is capable of 
transferring data at a six megahertz burst rate or at a 
three megahertz sustained operation rate (assuming 167 
nanosecond Main Data Memory). The PIOP can be usea \A»ith up 
to eight externa! devices (like A/0 converters or mass 
storage devices) thereby actina as an I/O bus controller. 

The PIOP interfaces directly with the OMA Controller 
in the interface unit. It has a 38-bit instruction word/ a 
20-bit arithmetic logical unit and is caoabale of addressing 
to one megaword of memory making it compatible with the Page 
Select Option. Communication with the AP-120B is 
accomolisned via one of eiaht flags and four interruots. 
The micro code suoports subroutines and has the logic to 
oerform jumps within its own code. 

The PIOP must handle all handshaking and timing 
considerations with noth the external devices and the host 
program to insure data integrity. This can oe complicated at 
times so a Proqrammable I/O Channel (PIOC) is also available 
which decreases flexibility but eases the programming buraen 
[331 . 



Neither tne PIOP nor PIOC Provides a method of 
connecting two AP-1206*s together in series without host 
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intervention which tends to limit some the possible 



applications of the AP-IPOB, 

C, PhOGRAMMIi\G, OPERATION AND EXECUTION 

The AP-lBOb can utilize the oarallel ooeration 
capability of the a doer/ multiplier ano aata transfers to 
increase execution of thc» crooram and throuohout on larce 
data arrays. These carallel ocerations must be controllec 
so that ootirum execution sceeo can oe realized without 
causina interloc!< or loc<out. Lockout coula eve'^tually leaa 
to a oroaram stoopaqe fll. Since m^ost scientific oata can 
best be struct urea into an array form/ tre array processor 
is able to work on it auic'<^lv ana efficiently in its natural 
state where a general purpose ccmouter must/ in most cases# 
restructure it ISb). 

Before the Ap-l^Ob can -vork on aata# the aata must first 
be transferrec trom its memory locations in tne host to '^ain 
Oata '''-lemory in the arrav crocessor (or movea to N'^ain Data 
^lemory trom an external nevice via the PIuP. That situation 
will not oe dealt with here since the PIUP is crogrammable 
and therefore oath ana data options associated with it are 
many.). The data is transferred via the interface with the 
use of the APPUT(riOST/ AP/N^TYPE) command (Put Data into the 
AP-1B08). As with arauments of other AP-i;?0B CALL 
statements/ HOST AP, N and TYPE neea not oe explicitly 
stated Put can be expressions/ integers or variables. 
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The host anO AP-120B must be svnchronized in their 



operations so computations can not go on while aata is still 
being transferred to memory* APWO (Wait on Data) causes the 
host to wait until data transfer is completed before it 
resumes executing the orogram* APWP (Wait on Running) 
causes the host to wait until the AP-120B is completed with 
one command before another is sent over. APwAIT is a 
combination of AP;vD and AP;\P, One difficulty encountered 
using these commands is that the host to monitor the 
progress of the execution if oollina is usec^ to determine 
APwU, APWR or AP':VAIT comoletion or the AP-l(£0b must wait if 
oriority interrupts are used/ which increases the time 
necessary to comolete the crogramr. 

Some of tne overhead of the host can be eliminated by 
not using the A P .N a i t on Running ( A p w R ) / A P a i t on Data 
(APaD) or Ap /.'ait (APwAIT) commands. This tecnniaue may 
speed UD orogram execution ana should only be used when it 
is absolutely necessary anc when there is no chance that the 
results will oe orocessec before tnev are actually present 
in the AP-1(?0B N-ain Data ^^emory. Floating Point Svstems 
suqaests that the orogram first be written ano executed with 
the APWR/ APWO and AP;^AIJ commands oresent ana the results 
Gotten* Then remioving a few of those instructions at a 
time/ the results can be checked to see if tney match the 
original results* This only works for specific applications 
and does not conform to modern programming practices* It is 



also extremely dangerous since 



it does not allow for speed 



fluctuations aue to temperature variations^ 

lAlhen processina is complete^ the data can be transferred 
back to the host via the APGET() command which operates in 
the same manner as the APPUT, 

The application proqram resides in the host memory and 
the host executes this program. The host will determine 
which routines must be passed to the AP-120B and if the data 
necessary is present in the array Processor. ‘^hen a routine 
is called# the host will jump to it and execute it but if 
the routine called is part of the math liorary (whether from 
APMATH or a user written math routine)# the host first jumps 
to APEX. APEX then loaas the 6^-bit instructions into the 
AP-120B Program Source ^^emory# calculates the remaining 
space available in the Program Source ^^emory# updates the PS 
location table# loans the parameters ana initiates the 
execution. If the same routine is called again immediately# 
it will not be reloaded since it is alreaay present but only 
the new oarameters will be loaded. If a different routine 
is called# APEX will first check the PS location table to 
see i^ there is enough unused space available to load it 
without aestroyinq any routines currently residing in 
Program^ Storage. If not enough soace is available# the 
last-written program will be overwritten with the newly 
called routine (Last In First Out (LIFO)). 

The overhead required for each math library routine 
called is between 100 and 1000 microseconds. One hundred 
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microseconds is the minimum time required to check the table 



and move parameters. This minimum time is reauired for 
every calif even in looping operations. During this periods 
the host must be available to the AP^l^OB which would cause 
unnecessary host overhead. While the AP-l^OB is executing 
any specific routine^ the host can be freed to do other 
tas<s and treat the AP-120B as a perioheral device. The 
host can either be interrupted or can use oollinq techniques 
to determine if the array processor requires assistance. In 
either case^ the programmer must be aware ot when a break 
occurs so he can insure that the prooer sequence of routines 
is used to allow the host to perform other operations and 
not be burdened by many AP-120B services. 

Several ways to increase available free time in the host 
are to transfer more than one vector with each APPUT or 
APGET command/ use octimum AP-120B library calls to perform 
qiven operations (it is the programmers r e s do n s i b i 1 i t v to 
determine which AP routines are best for each situation) and 
overlaD nost ana AP-120B operations whenever possible. 
Since every call of a routine requires nost intervention/ 
several routines can be comoined into one by writing a 
special macro combining those routines/ which will 
effectively eliminate some host overhead bv using only one 
’’call'* statememt. (Put these macros must be small aue to 
limited AP-120O program memory.) Since host overhead varies 
between 100 and 1000 microseconds/ with the higher value 
being due to the maximum amount of data and program 
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transfer/ some overhead can be eliminated by loading the 



most used routines first/ since overwrite is accomolished by 
LIFO. APEX must also be a part of the interruot priority 
scheme of the host (interrupt or Dolling); therefore/ by 
having the AP-120B at a high priority/ the overall wait time 
of the system due to interrupt ^^ait can be minimized 18]. 
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MAP-500 



V. 



The MAP-500 (Macro Array Processor) (fig 13) is manufactured 
by CSP Incorporated^ Burlington^ Massachusetts. The basic 
structure consists of three independent busses^ an executive 
routine^ t^o parallel arithmetic units/ an addresser and an 
input/output handler/ each having its own clock and 
operatina in a parallel asvchronous fashion. The Dasic 
logic units are the Central System Processor Unit (CSPU)/ 
the Arithmetic Proceessor (AP) (consisting of the Arithmetic 
Processing Unit (APU) and the Addresser Processor Section 
(APS))/ the host Interface Scroll (HiS) and an optional 



Input / 


Output Scroll (lOS). 


A 1 1 


except 


the CSPU 


use m i c ro- 


coded 


rout 1 nes 


stored in 


t h e i 


r own 


small 


memories and 


commun i c a t e w i t n 


each other 


V i a 


flags 


set in 


regi sters. 



(The CSPU stores its micro cooed routines in main iMAP 
memory.) The Host Interface Module (HIM) section of the HIS/ 
the IDS and the CSPU are built around a standard Intel 300d 
bit slice micro processor. 

The representation of MAP-500 numbers is usually a 
52-bit floating-point format with a one-bit sign/ a seven 
bit exponent (giving a range of lb -b^ to lb ** 65 biased 
by 6^ therefore 0 to 127 are the actual numbers storea) and 
a 2^A bit mantissa allowing a total range of 10 -77 to 10 
* * 76. Sixteen-Pit floating-point and lb-bit fixed-point 
numCers are also available. MAP-500 main memory is 
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addressable in either 32-bit full-words or 16-bit half-words 



but eight -bit bytes can be accessed by packing pairs into a 
lb-bit half-word (181 • SNAP-II commands like VFIX8 assume 
this oacking exists 15 The ability to address in half- 
words and/or bytes is important as it may increase the 
efficiency of the program and array processor^ allowing 
operations to be performed which may not have other v-vise fit 
in a word-only addressable memory. 

Although the MAP-300 is asychronous^ the advertised 
averaoe CSPU cycle time is aop r o x i ma t e 1 y 70-nanosecon Js with 
about 500-nanoseconds r ecu i red for a memory reaa/^rite 
operation when using 500-nanosecond mqs memory 
( 1 25-nanoseconds using bipolar). Full-word operands and 
results starting on an odd address boundary/ however/ 
reauire about two 500-nanosecond memory cycles. A pseuco- 
operation can be used to insure even-bounaary locations 
exist [ 18 ] , 



The MAP-300 is caoable of operating in temperatures 
from 0 to 50 degrees centigrade at 10 to 90 percent 
humidity. The power requirements are eitner 115 VAC or 230 
VAC single phase plus or minus ten percent at ^7 to 63 
herte. The weight is aoorcximately 80 oounos. 



The MAP relies heavily on internal parallel processing 
to increase throughout and limit wait time. The MAP-500 
stores the executive and array routines in its own memory 
(as ooDOsed to storing it in the host memory). V'nth the use 
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of function lists and statements like 



(MAP version 



of the '*DU the MAP can operate independently of the 
host after initial loading of the program [191. iNith the 
three bus structure/ the MAP theoretically can 
simultaneously input into one memory/ output from the second 
while doing computations on the third and never utilize the 
host except for initialization. 

The MAP has a separate instruction set for the Central 
System Processor Unit (CSPU)/ Arithmetic Processor Unit 
(APU)/ Addresser Processer Section (APS)/ and host Interface 
Scroll (HIS). Inasmuch as these processors work 
indeoenoently/ the instruction sets are not as complicated 
as mav have been necessary if operation was controlled 
totally from a central site. The total number of 
instructions cer second attainable bv the MAP-SOO is data 
dependent. /whenever all steps necessary to perform the 
operation are completed/ as witnessed by oroperly setting 
the correct flags in oseudo-memory (to be discussed later)/ 
the operation will perform to completion. v'^hile the 
addition/multi plication operaton is being carried out in the 
APU/ preparation for the next word (half-word) of 
information can be conducted in the unaffected processors. 
System flaps are used to communicate between the processors. 
These flags include General Purpose flags available to the 
Programmer for general system communication/ Control flags 
to control processor modes and operation sequencing/ Status 
flags to indicate processor status and Hardware 



Configuration flags tl8]. 

The MAP-300 system installed for evaluation consisted 
of: the MAP-300 processor/ interface uith the PDP-11 
computer utilizing the RSX-llM operating system/ 24K words 
of 500-nanosecond MOS master memory (8K for each memory)/ 
power panel/ expansion chassis/ installation/ I/O driver/ 
SNAP-II algorithm library/ cross assembler/ simulator end 
loader. The price of the system was S^^/500 (27]. 

A. CHARACTtRIST ICS AND HARDWARE 
1. CSPU 

The Central P'^ocessor Unit (CSPU) (fig 1^) is the 
"Commana Central"^ of the map- 300 array processor. The CSPU 
responds to commands from the host/ transfers aata to ana 
from the host/ assists the APS in address calculations and 
loads tne program memories of the Arithmetic Processor and 
Host Interface Module. The CSPU performs the functions of a 
front-end micro com outer to control the actions of the 
system. 



The CSPU has a fast/ fixeo-ooint arithmetic unit for 
address calculations/ an instruction register/ an eight 
register accumulator file and a priority interrupt network. 
It has access to the three main memories via the memory 
busses and suoolies the other MAP processors with the 
orogram instructions they need from main memory. Reentrant 
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CSPU Block Diagram 

Figure 14 



to/from additional processors 




I 

< 



subroutines ana multi-level inairect aadressing are 
recoqm’zea by the CSPU. It has no I/O caoability Gut 
instead instructs the Host Interface vScroll (or I/O Scroll) 
to perform input or output operations to or from the host 
(or externa! devices). The CSPU will never halt but will 
always be in the /I A IT state after its instruction sequence 
is comoleted. 



An important 


register i 


n the 


CSPU 


i s 


the 


Con t r 0 1 


Status Register or 


C -S t a t e 


.’i o r d 


(CS'.N) . 


1 1 


i s 


a 32 -o i t 


register containing the status 


o f 


prior 


OPe r a t i 


on s / the 


program counter as 


well as 


the 


source 


and 


destination 


locations for block 


memory 


t rans f er s . 


F i e 1 


dS 


of the 



register can be combinea to give hardware condition codes 
for use in conditional operations/ branches/ jumos or 
executes. The CSv\' also stipulates on which bus instructions 
or oata are present and controls the interrupt responses for 
other units. 

Tne CSPU is the only processor able to be 
interrupted in the MAP (other processors can either Halt or 
VNait) and contains a 0*^ level interrupt priority system with 
one interrupt device oer level and three lines per device 
(I9cf possible comoi nations). The CSPU may only be 
interrupted between instructions. It will also nest and 
queue lower priority interrupts if a higher priority 
interrupt is preceivea curinq the servicing of a lower 
priority interrupt. These interrupts are detected by 
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polling and levels are polled only if they are above the 
current interrupt level. Lower level interrupts will 
continue to exist but will not be recognized until the 
higher priority interrupts are serviced. 



The CSPU contains no memory but uses main memory to 
store its instructions, l^hen fetcheO/ these instructions 
are stored in the instruction register until execution. The 
CSPU may also address a pseudo- memory location called System 
Flag Register (SYSFLG) which is the primary i n t e r -o r oc e s so r 
communication system, by testing the bits of SYSFLG^ the 
CSPU can sense the status of any of the other processors, 
(Pseuao- Memory refers to memory physically located within 
the sub-processors but which acoear on the bus as a memory 
address si mi liar to the PDP-ll/3^/^5/55/o0/70,) (18), 

2, Arithmetic Processor 

The Arithmetic Processor consists of two components^ 
the Arithmetic Processor Unit (APU) ana the Aodresser 
Processor Section (APS). 

a, APU 



The Arithmetic 
responsible for the compute 
and executes programs relat 
MAP processors^ operating 
CSPU , The APU consists of 
main distinction between 



Processor Unit (APU) (fig 15) is 
tion reauired in array processing 
ively independent of the other 
under the General control of the 
two adders^ two multipliers (the 
the MAP-300 and the MAP-100 or 
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Figure 15 
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MAP-200 
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m u 1 t i p 1 i 


e r s while 
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one each), 3^ 


V a r i ou s 


regi sters 


and 


three 


F i r s t - I n 


-F 1 r s t 


-Out (FIFO) 



buffers for inout ana outout storage. The two adders and 
two multioliers oermit parallel processing of data to 
increase throughput. APU orograms are stored in main MAP 
memory ana are sequentially b 1 oc k - t ra n s f e r red to the APU 
orogram memory under control of the CSPU, 

The main units of the APU are the arithmetic 
processors (API and AP2). Each arithmetic processor 
consists of an adaer and multiplier that may operate 
simultaneously ana indenen gently of each other. Each adder 
is ted by eight registers ana each multiolier by four 
multiolicand registers and four multiolier registers. The 
results of the adder are routed to the result register R and 
the multiolier loads the product register P. To transfer 
data between the separate arithmetic orocessors# an exchange 
register is proviaed. 



APU memory consists of two 256* word lo-bit 
sids-by-side memories. Tne memory is initially loaded by 
the CSPU from MAP memory and the APU is tnen out into the 
run state. Instructions are sequentially decoded in the APU 
to perform the specified algorithm. The instructions are 
lb-cits for eacn board (API and AP2) and are executed in 
oarallel. They can perform addition, multiplication, 
transfer of data and the setting of flags. Tnese 
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instructions are aecoaeO and the operation started as soon 



as all necessary conditions are met. Immediately^ the next 
instruction is retrieved and decoded and attempts to be 
executed. If either the P/R register is involved in a 
multiplication/addition operation which has not yet been 
completed/ the Input Queue(IQ) is emoty or the Output Queue 
(00) is full/ the APU will go into a ’’wait’’ state. It will 
remain in this ’’wait” state until the 
multiplication/addition instruction is completed or the 
other conditions are satisfied. There is a problem that can 
exist cue to the sids-by-side 16-bit memories used for 
program storage. Since there is only one proaram counter 
and the API and AP2 processors work in parallel the sios- 
by-side memory acts as two halves of a 32-bit instruction 
register. Therefore if one board (API or AP2) is forced to 
wait/ the other must also wait since the next instruction 
may not be retrieved until tne proaram counter can be 
incremented. 



The Input Queue is a four-deep FIFO buffer which 
services both API and AP2. To get the next incut data 
field/ the IQ must be advanced before the data is 
transferred. If both boards request data without advancing 
the queue/ they will receive the same data/ which may be 
good for certain applications. If they both simultaneously 
try to advance the IQ/ it will advance only once and give an 
API priority/ then advance the second time after the 
transfer has been completed to give data to AP2. 



There are two Output Queues each of which is a 



four-aeep FIFO buffer. These queues allow maximum capacity 
of the adder and multiplier to be u t i 1 i z ed, s i nc e it is less 
likely that the processor will have to wait for either 
buffer to have a vacancy due to a busy bus system. If both 
processors try to act on any single OQ^ orocessor API will 
be given the priority. 



A tyoical multiplication takes approximately six 
cycles (^20-nanoseconds) and a tyoical ado takes about three 
cycles (210-nanoseconds). Therefore/ to increase 
throughout/ "hiding” adds/ moves/ etc. behind multiolies 
will accomolish operations in the time it takes to do the 
multiply alone. The most efficient method to program the 
MAP-300 is to treat successive samole sets in alternate 
processors; this effectively oroduces a multioly every 
210-nanoseconcjs. Since there is one incut queue/ this 
method allows both to have access to the same information 
(by not incrementing the queue) and also gives a greater 
chance to use hiding effectively. 



The APU can usually operate in two modes. Mode 
One/ the normalized moae/ can either use normalized or 
unnormalized floating-point numbers as input with the 
results being a normalized float inq-ooint numiber. Using 
unnormalized floating-point numbers as incut can lead to 
precision loss since the normalization process will shift 
the mantissa to the left (values less than .1) or to the 
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right (values greater than 1,0), The vacancies created by 
these shifts will be filled ^ith zeros^ which/ after 
computation/ could possibly oroduce an unusual truncation. 
The unnormalized mode will accept unnormalized numbers as 
input and will return unnormalized numbers as output [16). 

b- APS 

The Addresser Processor Section (APS) (fig 16) 
computes both the adaress in MAP memory for the location of 
input data worns to be processed by the APU and the MAP 
memory addresses for the output from the APU. It operates 
indeoenaently of other processors/ within status ana control 
flag constraints of SYSFLG. The APS contains a 128-word 
25-bit memory/ four program counters (two for read and two 
for write)/ eight address buffers (to be used as inputs to 
the adder)/ four First-In-First-Out (FIFO) buffers/ an 
arithmetic logic unit (adder)/ ana associated logic and 
control units. 

The APS programs are stored in MAP main memory 
and are loaded by the CSPU. Certain absolute address 
locations must oe known to a APS proaram at run time which 
are not available during proaram writing. The assembler 
computes them at assembly time and the CSPU inserts them 
into the proper location ouring this program transfer. The 
CSPU then initiates APS operation by setting the proper 
flags. The APS may be loaded with new information by the 
CSPU during run time by cycle stealing/ thereby not causing 
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tne APU to slow and wait for a value in the IQ or a space in 



the OQ* Because the instructions in MAP memory are 32-bits 
long ana the APS instruction is only 25-bits long/ the seven 
bits left over are used to store the APS memory address for 
that instruction. This allows the CSPU to increase 
throughput by immediately install i no the instruction into 
the correct location in a pre-computeo order. 

The adder computes addresses dependent on prior 
computational results/ literals or specified increments. 
All address addition ana subtraction is considered to be 
modulo 2 17 so tnat only oositive addresses in that range 
will be comouted. Results are oueued in either the Read 
Address FIFO (RAF) or /Jrite Aadress FIFO (v\AF). Along with 
the address is a code to delineate whether the address is 
full-word/ half-word or oyte (cair of bytes in a lb-bit half 
wore address) and if it is a eiaht-bit fixed-point number/ 
Ib-Pit fixed-point number/ 16 bit f 1 oa t i ng-po i n t number or a 
32-bit floating-point number. 

The distinctive feature of the APS is that there 
are four program counters (PO/ PI/ P2 and P3). These allow 
four separate programs to be stored in the APS and executed 
in an interleaved manner. Seauencing of these programs is 
controlled bv the status of the WAF and RAF in conjunction 
with the APS instructions. These prodram counters also 
provide a loopina ability allowing the APS to worx with the 
Host Interface Scroll or I/O Scrolls to keep data flowing. 
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After one Tiemory has been processed and reloaded 



the APS 



need not be reinitiated out can continue operation on the 
new data oy this looping feature [18]. 

3. Host Interface Scroll 

The Host Interface Scroll (HIS) consists of two 
subsections# the Host Interface i^odule (hIM) (fig 17) which 
is located in the MAP-300 and the Host Interface Controller 
(HIC) which is located in the host memory. The host 
Interface Module transfers MAP programs# unprocessed data# 
host status and Host Interface Controller commanas from the 
host to tne MAP. Processeo data# MAP status and processing 
commands are also transferred from the MAP to the host via 
the H[M. A programmable scroll processor is Provided for 
computing MAP and host memory locations durinq a Direct 
Memory Access (DMA) operation. Other pertinent devices 

include a m,emory-bus interface# controllers for host memory# 
format conversion hardware# status and control logic along 
with interrupt logic. 

The flic controls the handshaking necessary between 
the host and the MAP. The handshaking consists of interrupt 
logic from MAP to host and logic necessary for controlling 
the transfer of data with either Direct Incut/OutPut (DIO) 
facility or DMA transfer (18]. 

The host generally interrupts the MAP to initiate 
program seguencng. However# when the MAP is comoletea# it 
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will initiate communication (interrupt) with the host for 



further work. v^jhen the interrupt is acknowledgeO by the 
hostf more data or programs are sent to the N^AP depending on 
the flags. (If all processors are in a loop operating 
on data supplied from external devices and deliverea to 
external devices via I/O Scrolls/ the host will not be 
interrupted unless there is an error. Ihis frees the host 
to do any other unrelated processing necessary.) The 
maximum response time to initiate an interrupt is 150 
microseconds for the HIN' and 250 microseconds for a user 
CALL rout i ne 135] . 

4 . '^5 e m o r y 

Main memory in the map- 300 consists of three 
independent pusses each having the capability of 256K words 
of 500-nanosecond MOS memory or o^K words of bipolar memory. 
Memory types may not be intermixed on any given bus but each 
bus may have a aifferent type from another bus. Memory can 
also be either master or slave/ master memory oeina used to 
control program execution/ aroitrate and observe system 
Protocol while slave memory stores the data. each memory 
bus containing memory is required to have at least one 
master memory module (available in either or 3K blocks 
for MOS or IK/ 2K, or clocks for pipolar). 

Access to each memory is via a common bus having 11 
ports and two priority levels. Three ports are reserved to 
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be used with the absolute priority scheme leaving eight 
ports with a sequential round-robin (polite) priority 
scheme. Absolute priority is the highest priority ana is 
intended to be used with high sceed minimally-buffered 
devices such as disc units or tape units where loss of data 
may result. Sequential round-robin priority handling is 
used for slower buffered devices and is a round-robin 
(circular) aueue which is checked each memory cycle. The 
device first in the aueue will get the next memory cycle. 
Scanning for the next queued device will commence 
immediately upon the previous device starting tranfer. 
the next memory cycle occurs the new device will be known 
keeping overhead minimal. Of these 11 PortS/ the HIS ana 
CSPU each have one dedicated port and the has two 

dedicated ports on each bus with seven ports remaining for 
the lOS and other uses. 

Psuedo-memo r y (alluded to earlier) is the upper 
words on Bus 1 containing addresses of certain registers 
used for status and control. These registers are located in 
the suD-processors but appear as addresses on the memory 
bus. Any sub-processor may alter the contents of these 
locations so it is important that the oroarammer not try to 
overwrite these addresses with programs or data [18). 



B. soft.narb support 



As with the AP-1208/ there are software routines to aio 



in program development and execution. 

1. Executive and Associated Routines 
a. Assembler 

The MAP-300 assembler/ written in ANSI Fortran 
IV/ takes a source program written for either the CSPU/ APU/ 
APS/ HIS or lOS and creates an executable object module. A 
listing file ana errors file can also oe created. Editing 
and updatinq can be accomplished from tne last source file 
by chanaing and assembling only the incorrect line (or 
lines) of code/ thereby avoiding the reassemblinq of the 
entire program (18]. The assembler will also allow change 
of the dIM memory to enable it to handle necessary 

I 

buffering. 

b . S i m u 1 a t o r 

The MAP Simulator Program simulates model 200 
and model 300 processors by executing ^lAP object code. The 
simulator Permits the programmer to develop or debug 
software off-line so as not to disturb production schedules. 

The map Simulator Program has tne capability of 
simulating the operation of the APU/ APS/ CSPU/ ^emory ana 
the interrupt handler. It has not been updated to handle 
cert i an new commands ana fleas (listed in the front of 
ref(25]) nor does it have the ability to simulate the APU 
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test mode* Memory size ana tyoe can be specified either in 
the imtiaJ loading of the simulator or while running to 
tailor it for current or proposed configurations. 

When used as a debugging aid/ the MAP Simulator 
Program allows the operator to: install breakpoints and 
execute macro instructions at these breakpoints; detect 
program errors ana execute macro instructions after their 
discovery; examine register contents; run programs from 
different processors (APU/ CSPU/ etc.) independently; and/ 
patch loacJed proarams. Input / output may be obtained from a 
terminal/ orinter/ taoeCmagnetic or paper)/ cards or 
cassette. A batch mode is also available. Actual program 
timing can be estimated by installing breakpoints and 
individually timing small sections of code [25], 

c . Loader 

The MAP Loader is a Fortran orogram which 
acceots object coae produced bv the Assemoler ana create 
blocks of binary code in MAP machine language. This code is 
transmitted to the MAP memory via the MAP driver through the 
Host Interface Scroll. Errors in transmission are 
detectable since check-sum digits are transmitted to the )MAP 
along with the blocks of cede. The Merge operation creates 
and updates the tables and addresses necessary if the loaded 
module is to be used with the SNAP-II executive [22]. 
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d# Debuq Package 



The MAP-300 diagnostic package is designed to 
verify hardware operations and isolate any malfunction^ to a 
specific card* One module is resident in the host while 
another^ which contains the test modules and test programs 
necessary to determine proper system operation of the CSPU 
and other sub-processors/ is present in the MAP. This 
software can run interactively or under batch processing 
( 18J . 

The MAP-300 LOOK proaram permits the programmer 
to examine I'^IAP memory (or Dseudo"menriorv) from any computer 
capable of operating under ANSI Fortran IV. This is also an 
interactive routine and provides the ability to “patch’* 
coded program seaments or enter entire machine languaae 
programs. The programs or segments can then be stepped 
through to examine the results closely 120]. 

2. SNAP^I I 

Systematic Notation for Array Processing Version II 
or SNAP-II is a single-command high-level macro-type 
language used to program the MAP-300 array processor. The 
SNAP-II package consists of a Host Support Modulef Host/ MAP 
driver module f SNAP-II Executive^ SNAP-II Function Modules 
and an Installation test and Acceptance test Module 118]. 

The SNAP-II executive permits the user to define 
buffer size/ and the structure and location of programs in 
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MAP memory 



The executive also structures the routines to 



operate at maximum speeo bv insuring that the maximum 
possible parallelism exists between s ub-o r oc e s s o r s (for CSPI 
written functions)f thereby accentuating ’’hiding’*. The 
SNAP-il subroutines are written in ANSI Fortran and passed 
to the MAP via Function Control Blocks (FCB). The MAP 
Driver/ which is located in the host/ directs the loading 
and operation of the orograms. (In a looo or *’Map While** 
condition the driver need only load and initiate the 
sequence then return control to the host ooerating system.) 

SNAP-II allows the programmer to build nis own 
function lists with the Fortran tvoe statement '*Map Begin 
Function List" (MPBFL()) which oermits the host to remain as 
free as oossiole from the ooeration of the N^AP. Two- 
dimensional arrays are demultiolexea by SNAP-II thereby 
increasing speed of execution in the orocessor oy not having 
to compute two-dimensional address structures. SNAP-II 
functions are callable from either ANSI Fortran or Host 
assembly language orograms and are able to operate on both 
real ana complex data (15J . 

3. Programming Language 







If SNAP-II functions are not 
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f i c enough 
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written in 
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assembler type language. The CSPU/ APU/ APS and HIS each 
have their own instructions to ootimize each sub-processor*s 
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c aoab i ] 1 t 1 e s 



The CSPU instructions are broken into 10 groups 
which have the ability to oerform all the functions that a 
general puroose comouter is normally visualized as 
performing. They include: generic (performs interrupt 
system cooing and looping); single register; move; logical; 
push and pop; hop and jumo (a hop is within 256 half*word 
locations and a jump can be to any new location); skio ana 
bit manioulation; comoare; and maintenance and test console 
instructions. The APU can perform: two-argument adder; 
single argument adder (like aDoroximate reciprocal 
instructions); multiply; aata transfer; jumo and call; and 
control operation instructions. The APS oerforms: load; 
address increment; register arithmetic and control type 
instructions. The hIS recognizes: single register; logical 
register; arithmetic register; literal ana control 
instruction types [18]. 



Since each suO-processor is designed to perform a 
special operation and can be programmed to optimize that 
design/ the overall performance of the system is increased. 
All Processors perform in parallel and stay in "sync” by the 
use of flags. A sub-processor will wait until the proper 
flag is set before continuing/ thereoy insuring intearity. 
The waiting also relieves the programmer of "counting 
cycles" with No Operation (NOP) instructions which could 
possibly cause lost data. The drawback is that he does have 



an increased comolexitv by insuring that proper flags are 
set at the prcoer time lloJ. ^ost of these encumbrances are 
eliminated by the executive however. Flags are available in 
Dseudo memory and are easily tested. The complexity issue 
is minimal since for most aoplications only APU and APS 
routines need be written. Only under soecial circumstances 
is a CSPU or HIS routine required. 

Pseudo-ooe ra t i ons are also available to ease the 
programming Duroen. They perform such tasks as naminq 
character strings/ insuring that information is olaced into 
memory on a wora oounaary/ generating constants and making a 
test Control Status /»ord (CSw), 



n. I/O Scrolls 



The I/O Scrolls (IQS) control block-transfers to or 
from external oerioheral devices (incluaing other MAP’s) 
without interferring with tne ^AP-300 Processing cycle by 
using a suP-o r oc e s s o r which can oe o re -p rog rammed . The lOS 
contains three functional elements: orotocol logic necessary 
to interface the external device directly to the MAP-500 
memory busses/ a programmable processor to compute MAP 
addresses and issue control signals? and/ tne transfer logic 
necessary to interface with oeripheral devices. 



There are five 
as the maintenance 
transferring eight-bit 



basic lOS models, lOSl/ also known 
ana test console/ is capable of 
single wortJs to MAP pus number one at 
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a S KHZ rate. I0S2 has two transfer rate options and two 



word size options available. Word size option one utilizes 
the b 1 oc k-t rans f er of 8 or Ip-bit words to any of the three 
MAP busses while option two uses either 16 or 32-bit words. 
Transfer rate option one conveys information at a 1 MHZ rate 
as compared to the 2.5 MHZ rate of option two. Either 
transfer rate option may be comoined with either word size 
option; however^ only one combination is available at a time 
since they are hard-wired. Under crogram control/ I0S3 can 
transfer either lo or 32-oit woros to any of the three 
busses at a 750 KnZ sustained rate. I0S3 can also oerform 
format conversion/ monitor data with a basic ooeration 
similiar to the HIM and sucoort indirect adoressinq. lOS^ is 
a high speed (up to ^0 ^''^HZ) scroll/ allowing block transfers 
only iof 8/ 16/ 32 or 6^-bit words to any Dus (6^-bit worcs 
must be transferred simultaneously to bus 2 and bus 3). 
lOS^ also allows Packing and buffering of data [18]. 10 S 5 
is a airect memory-to-memory bus-connect option for airect 
data transfer between user aevices and the MAP-300. The 
module requires no software (and will not support software). 
Its operation is controlled by hardware ana three interrupt 
request 1 i nes (21]. 

a. Analog Data Accuisition Module 

The Analog Data Acquisition Moaule model 5120 
(ADAM -5120) is a programmable analog interface capable of 
accepting from 2 to 16 channels of analog information. This 



information is then digitized to 12-bit resolution at a 270 
KHZ throughput rate for the 16-channel case (125 KHZ for 
single channel). As with the I/O Scrolls^ the A/D operation 
may taKe place simultaneously with the MAP-300 processing. 
The ADAM is functionally ecuivalant to the I0S2 with only 
added an a 1 og - 1 o -d i g i t a 1 circuitry. This allows the ADAM to 
be SfMAP-II comoatible. 

The operation of the ADAM is carried out via a 
set of UP to lo samole-anc-hold units which then make their 
sianals available to a lb:l multiplexer. Each channel of 
the multiplexer is the consecutively sampled by the A/D 
converter which outputs either a 16-bit s i qn-magn i t uoe or 
Ib-Dit f 1 oa t i na-oo i n t number. Performance accuracy is 
specified a 0.2 percent of full-scale resolution [2]. 

C. PROGRAMMING/ OPERATION AND EXECUTION 

The MAP-300 can not only utilize parallel operations of 
the adder and multiplier in the APU/ but also the parallel 
sub*processor operation of the APS/ HIS/ 103/ APU and CSPU 
to increase total throughput. The programmer/ ov breaking 
the Problem into smaller independent proarams of ad (dressing/ 
arithmetic/ I/O and management/ can theoretically more 
easily proaram the entire problem than by adherring to 
internal communication protocol and flags [181. The 
respective programs should be easier to write with mucn of 
the increase in overhead due to the added handshaking and 
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protocol requirements beinq assimulated by the executive. 
[16J . 



CSPI recommends that a modified top-down programming 
technique be used initially by writing the APU routine first 
to insure the optimum execution speed. Then aoding the other 
necessary routines (generally just the APS routines) to 
insure the information is present when the APU needs it. 
The APU shoula be orogrammed to treat subsequent sample sets 
in alternate adaer/multiolier modules and arrange data so 
that as many adds can be ''hidden'* as oossiole Lib] - By 
proper execution^ sequencing total time can be shortenea to 
equal the time to multiply only/ with all other operations 
"hidden" under these multiplies. This "hiding" operation 
becomes easier in the N'AP-SOO than in the AP-120B since 
cycles need not oe counted and NOP's neeo not be inserted 
for unused cycles due to flags being set to signal the 
availability of resources lib). The orogrammer must be 
aware that the timing is not absolute/ therefore the 
executive will tightly control synchronization oy flags to 
insure one adder/ multiplier does not get aneao of the other. 



The programs are initially loaded from the nost to the 
MAP via the operating system interface and driver. The 
^APDVR.MAC routine maxes the standard interface tnrough the 
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word Driver Control Block (DCB) (36J, When the Central 
System Processing Unit is initialized by the host/ it will 
load the other sub -p r oc es so r programs and commence orogram 
execution. 

Subsequent MAP commanas are sent to the MAP from the 
HOST via Function Control Blocks (FC6) which require host 
intervention to send, (Function lists and the MPWHL macro 
treat multiole FCB*s as a single entity). These FCB*s 
transmit host to f^^AP status/ interruots and functions to 
perform and can be queued in the HIS buffer, A hen it is no 
longer necessary for the host to send or receive a FCB, it 
can perform other operations (3S1. Therefore/ with 
efficient use of the IDS and the possibility of stringing 
MAPs in series/ the host can be free to either oerform other 
tas<s or act as a system monitor. 
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VI 



DISCUSSION OF FINDINGS 



In the test oeOf the PDP-ll/i^ was chosen to perform 
the front-end functions which consisted of buffering the 
data^ formatting it and then passing it to the array 
processor or mass storage device (or from the mass storage 
device to the array processor). This limited front-end 
inputting function did not dictate that the computer be 
large. Tne choice of the PDP-11/3^ computer for this 
application seems adequate. The PDP-11/0^ would normally 
contain enough speeo to handle the necessary operations but 
may be unsatisfactory since it does not have a resident 
memory control ana protection routine to ease the 
programmers burden ana help insure system integrity^ nor 
does it contain the 2K cache memory to increase soeed. A 
computer larger than the PDP-11/3U may not increase the 
efficiency of the system although it would increase the 
cost. 



The test bed utilized the PDP-ll/70 for the output 
computer. The output comouter would be required to receive 
information from the array processor^ manipulate the data 
and store it for future display on one or more aevices. For 
this application^ the PDP-11/70 seems best for several 
reasons. The system is much like the 11/34 except that the 
current maximum memory is 2 megabytes to allow for better 
utilization of information. There are dedicated paths to 
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hi ah performance storage devices that would allow more 
information to be processed per unit of time. To further 
process arrays for output/ there is a 32-bit or a 6^- bit 
floating-point arithmetic unit available. The PDP-11/70 
gives 1 a rge -c ompu t e r performance and expansion capabilities 
with the cost and space reauirements of smaller units 151]. 
Using tne same manufacturer for the output function as was 
used for the input function reduces interface orcolems and 
contributes to the proficiency of the programmers by 
increasing overall knowledge of the architecture. 

The proposed test oea uses of the 11/34 and 11/70 can 
be greatly modified by the choice of the array processor. 
The MAP-300 utilising an Analog Data Acauistion Module 
and/or I/O Scroll can eliminate the need for the input 
functions (including 16 channel analog-to-aigital 
conversion) therefore permitting the 11/70 (or possibly a 
less costly model) to perform input/ output ana monitor 
functions in the test bed. In fact/ the 11/70 will probably 
be large enough and fast enough to facilitate combining all 
subsystems/ except the display subsystem/ under one 
computer. The 11/34 ana 11/70 combination shoula provide 
for the full range of computers necessary to properly 
emulate and evaluate just how much computer capability will 
actually oe needed for any specific application. 

The question arises as to which is the best array 
processor for the application. The AP-120B is synchronous/ 
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therefore some may say safer, has a 38-bit word which could 
mean greater accuracy/ more standard library functions (such 
as vector log base 10 and vector log base e) and a 3500 hour 
mean time before failure. The MAP-300 is a newer system 
which, due to the minimal host involvement, three seoarate 
busses, I/O Scrolls and the ADAN*, can provide greater long 
run throughput and more flexibility. 

For the non real-time environment where simple 
programming and host involvement can be tolerated, the 
AP-120B may be a good choice. It can provide facilities to 
tailor algorithms to specific needs; tnese facilities are 
not yet too complex to tax the normal Programmer. however, 
new programs cannot be addec directly to the AP math library 
(APMATH) out must be linked and loaded for every usage as 
would any aoDlication orogram. This creates an excessive 
time overhead. Therefore, the AP-l<?0b should be used only 
where simplicity and ease of use are paramount and utility 
can be sacrificed. 

For applications recuirinq real-time computations 
(which the test bed most likely will eventually demand) 
innovative desian, high throughput rates and generally 
greater flexibility, the MAP-300 provides the answer. The 
improved Performance of both array-processing Potential and 
computer availability is offset by the increased cost of 
program development if non-library routines must be written. 
These routines however may be added to the library 
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effectively reducing overhead. Reference [23] reports that 
the MAP-300 also complies with MIL-E-16^00r MIL-E-5^00, 
MIL-STD-apl A, MIL-STD-70UB and M I L-S TD- I 399 . 

During the installation of the MAP-300 at the Naval 
Postgraduate School/ it was noted that the installation 
documentation was extremely poor. As of this writing/ three 
weeks were required to install the system. This was due 
mainly to the DOor documentation in the installation package 
received with the unit. Not only ^ as the oackage 
incomplete/ but changes to the software were performea that 
were not changed in the original documentation^ nor was an 
eratta sheet provided. 

It is realized that for many companies involved in data 
processing eguioment manufacture/ documentation is not a 
chief concern. However/ CSPI seems to have far inferior 
installation documentation than would reasonably be 
expected. This situation made it impossible to do a good 
test of the system ooeration but allowed only a cursory 
review. 

Even with the evident shortcomings of the documents/ 
theoretically the MAP-300 is far superior to the AP-120B. 
If CSPI would upgrade their documentation and perform the 
installation at the site/ their sometimes negative Public 
image could be eliminated and confidence in tneir eauipment 
could be increased. It must be noted ho^vever that ref [IPJ 
and the Publication ** Simple Notation For Array Processing/ 
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Version II/ Reference Manual"/ are excellently written. 
There/fore in the following discussion/ the use of the 
MAP-300 will be assumed. I will now look at each subsystem 
closely and attemot to determine alternate designs. 



The analog subsystem ootains data from one of four 
sources: time code read/qenera t or / 14-track recorder 
(Honeywell 96)/ signal synthesizer (Rockland 5100) ana/or a 
noise generator (HP 3722A), Lio to 128 channels of input are 
amplified/ sent through a orogrammable matrix switch 
resulting in 32-cnannel output signals to a orogrammaole 
32-Channel filter. Tnese analog signals then leave the 
analog subsystem to be input to the signal processing 
subsy St em . 



The AM -5400 analog-to-digital converter performs a 
12-Oit A/L) conversion and is then loaded the Amoex Megastore 
mass storaqe device through the PDP-11/34 computer. The 
output of the array processor will then oe sent to the data 
processing subsystem. 



I suggest it may be easier/ more flexible and cheaper 
to inout the 52 channels as before to the orogrammaole 
filter/ but tnen the 32 channels may be better handled by 
two Analog Data Acauisition Modules directly into the MAP 
for processing or via an I/O Scroll/ model 3/ be sent to the 
PDP*ll/70 storage devices for future use. This will 
eliminate the expense of the A/D converter/ Amoex Megastore 
and the PDP-11/34 but more important/ it will be relatively 
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easy to perform calculations in real-time. Once the MAP-300 
is started/ it can perform without host intervention until 
interrupted and witn an assumed input of ^0 KHZ/ the system 
should not Oe taxed. The output of the MAP can then be sent 
directly to the data-processng subsystem. The entire system 
can also be less complex, affording easier system 
de ve 1 oomen t • 

Assum.e that a fictional system with a ^0 KiiZ input 
requires a FFT and discrete cliaital filter to be done on the 
information. The timing of a 102^ real to 51o complex 
Fourier transform reauires 3.0 milliseconds 123] and a ^0 
KHZ input rate would require 39.1 FFT’s per second on the 
average. This would consume 117.3 milliseconds and assuming 
a 50 percent overhead yielc 175.95 milliseconds to perform 
the Fourier transform. Discrete filtering would require 
another 39.1 ^ ( 102 ^4 * ( 2 * 500 nanoseconds + 12 * 70 
nanoseconds)) or 11> .bl milliseconds. Again assuming 50 
Percent overhead/ 110.51 milliseconds woula be necessary for 
tne filtering. The total time consumed by the two functions 
would be 286.5 milliseconas, leaving 713.5 mi 11 i seconds for 
other wor<. (Fifty percent overhead is an over-estimation.) 
Load i no data into the MAP-300 would be hidden behind the FFT 
operation (except for the initial case) ann woula not 
contribute to overall execution time. 

This would effectively eliminate the entire signal- 
processing subsystem with th exception of the ^^AP-300. The 



100 



PDP-11/70 computer in the data orocessing subsystem could 
control the MAP along with its other intended function of 
controlling the display subsystem. Any storage necessary 
for output or any taped input data could be handled by the 
tapes and disks associated with the 11/70 and execution 
could be performed on the MAP-300 along with the above 
calculations. However, for expanded utilization, not 
specifically addressed, the above use of only one MAP and no 
PDP-11/3^ may have to be modified to accomodate the new 
reduirements if these new reauirements are significantly 
1 a r ge r . 

If after extensive test i no the maP- 300 proves to be too 
costly due to unreliable soft ware/ the AP-l^fOB can perform 
the same functions although at an increased hardware and 
time cos t . 

For example, in the AP-1^£08, to perform the above real 
to complex FFT, it requires 5.08 milliseconds for the FFT, 
0.8 microseconds to rescale ana 1.7 microseconds to reformat 
the result for a total of 5.09 milliseconds per 102^ sample 
FFT. To this must be added 100 to 1000 microseconds 
overhead for each of the four call statements: Get data 
from the AP-120BCAPGET), Put data into the AP-120BCAPPUT), 
real to complex FFT(RFFT) and real FFT scale ana 
format(PFFTSC). I will use the arithmetic average of 550 
microseconds per call for an added 2.2 milliseconds 



resulting in a subtotal of 7.29 milliseconds per FFT. APPUT 



and APGET have no specific times in ref [61^ but according 
to Floating Point Systems the PDP^ll interface transfer rate 
is 750 khz. This would therefore reauire approximately 2.67 
milliseconds for each 102^^ element transfer giving a total 
of 9.96 milliseconds each for 39.1 FFT*s. This results in a 
389.5 millisecond execution and transfer time. Again, 
allowing for 50 oercent overhead safety margin, the total 
becomes 57^.16 milliseconds oer second. To perform the 
discrete filtering would recjuire an additional APGET, APPUT, 
RFFT, RFFTSC as well as a vector multiplyCV^UL) and a 
complex vector multiply(CWUL) bringing the time to compute 
one seconds worth of data to well over one second. 

Therefore another AP-120B must be installed to insure 
that speed reguirements are met. Also/ since the host 
computer must be interruoteo many times, it may be necessary 
to retain the PDP-11/3^ in the signal Processing subsystem. 
There is also the consideration that if a math routine is 
custom written, it will not be able to be loaded in the math 
library which will generate considerable overhead each time 
it is called. (The amount of this overhead time is system 
dependent . ) 
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I 



CONCLUSIONS AND PECOM^^END A 1 1 ONS 






The test-bed as prooosed seems to be a workable aesign/ 
although for most applications a more efficient and 
economical architecture may be constructeo. 

For many uses the neec for the PDP-11/3A comouter and 
the AN -5^00 A/D converter seem unnecessary when used in 
conjunction with the ^IAP-300 array processor. The Ampex 
Meaastore may be reauirec for a few applications but would 
not be suitable for the majority of apolications (including 
real-time) since a disk oerioheral attached to the PDP-11/70 
would be cheaper ana still perform the same functions. 

It is felt that the increase in comolexity and possible 
confusion usina the maP- 300 over the AP-1P05 can be 
overshaaowed by the reduction in equipment recuired by the 
MAP-300, This increased proficiency should even ce more 
greatly felt (assuming a normal learn ina curve) with 
subsequent installations. Also/ with the time savina in 
execution/ extra calculations could be Performed on the MAP 
in a real-time environment/ thereby increasing efficiency/ 
operability and soectrum. 

It is recommenaea that further tests be conducted using 
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